Topic:Temporal Convolutional Networks
What is Temporal Convolutional Networks? Temporal convolutional networks (TCNs) are deep learning models that use 1D convolutions for sequence modeling tasks.
Papers and Code
May 15, 2025
Abstract:With the growing demand for healthy foods, agricultural product price forecasting has become increasingly important. Hass avocados, as a high-value crop, exhibit complex price fluctuations influenced by factors such as seasonality, region, and weather. Traditional prediction models often struggle with highly nonlinear and dynamic data. To address this, we propose a hybrid deep learning model, TCN-MLP-Attention Architecture, combining Temporal Convolutional Networks (TCN) for sequential feature extraction, Multi-Layer Perceptrons (MLP) for nonlinear interactions, and an Attention mechanism for dynamic feature weighting. The dataset used covers over 50,000 records of Hass avocado sales across the U.S. from 2015 to 2018, including variables such as sales volume, average price, time, region, weather, and variety type, collected from point-of-sale systems and the Hass Avocado Board. After systematic preprocessing, including missing value imputation and feature normalization, the proposed model was trained and evaluated. Experimental results demonstrate that the TCN-MLP-Attention model achieves excellent predictive performance, with an RMSE of 1.23 and an MSE of 1.51, outperforming traditional methods. This research provides a scalable and effective approach for time series forecasting in agricultural markets and offers valuable insights for intelligent supply chain management and price strategy optimization.
Via

May 14, 2025
Abstract:Time event cameras are a novel technology for recording scene information at extremely low latency and with low power consumption. Event cameras output a stream of events that encapsulate pixel-level light intensity changes within the scene, capturing information with a higher dynamic range and temporal resolution than traditional cameras. This study investigates the contact-free reconstruction of an individual's cardiac pulse signal from time event recording of their face using a supervised convolutional neural network (CNN) model. An end-to-end model is trained to extract the cardiac signal from a two-dimensional representation of the event stream, with model performance evaluated based on the accuracy of the calculated heart rate. The experimental results confirm that physiological cardiac information in the facial region is effectively preserved within the event stream, showcasing the potential of this novel sensor for remote heart rate monitoring. The model trained on event frames achieves a root mean square error (RMSE) of 3.32 beats per minute (bpm) compared to the RMSE of 2.92 bpm achieved by the baseline model trained on standard camera frames. Furthermore, models trained on event frames generated at 60 and 120 FPS outperformed the 30 FPS standard camera results, achieving an RMSE of 2.54 and 2.13 bpm, respectively.
* This paper is a preprint of a paper submitted to IEEE Access and is
currently under review
Via

May 13, 2025
Abstract:Most of the dynamic graph representation learning methods involve dividing a dynamic graph into discrete snapshots to capture the evolving behavior of nodes over time. Existing methods primarily capture only local or global structures of each node within a snapshot using message-passing and random walk-based methods. Then, they utilize sequence-based models (e.g., transformers) to encode the temporal evolution of node embeddings, and meta-learning techniques to update the model parameters. However, these approaches have two limitations. First, they neglect the extraction of global and local information simultaneously in each snapshot. Second, they fail to consider the model's performance in the current snapshot during parameter updates, resulting in a lack of temporal dependency management. Recently, HiPPO (High-order Polynomial Projection Operators) algorithm has gained attention for their ability to optimize and preserve sequence history in State Space Model (SSM). To address the aforementioned limitations in dynamic graph representation learning, we propose a novel method called Multi-view Dynamic Graph Embeddings with State Space Model Gradient Update (DyGSSM). Our approach combines Graph Convolution Networks (GCN) for local feature extraction and random walk with Gated Recurrent Unit (GRU) for global feature extraction in each snapshot. We then integrate the local and global features using a cross-attention mechanism. Additionally, we incorporate an SSM based on HiPPO algorithm to account for long-term dependencies when updating model parameters, ensuring that model performance in each snapshot informs subsequent updates. Experiments on five public datasets show that our method outperforms existing baseline and state-of-the-art (SOTA) methods in 17 out of 20 cases.
Via

May 12, 2025
Abstract:This study focuses on the challenge of predicting network traffic within complex topological environments. It introduces a spatiotemporal modeling approach that integrates Graph Convolutional Networks (GCN) with Gated Recurrent Units (GRU). The GCN component captures spatial dependencies among network nodes, while the GRU component models the temporal evolution of traffic data. This combination allows for precise forecasting of future traffic patterns. The effectiveness of the proposed model is validated through comprehensive experiments on the real-world Abilene network traffic dataset. The model is benchmarked against several popular deep learning methods. Furthermore, a set of ablation experiments is conducted to examine the influence of various components on performance, including changes in the number of graph convolution layers, different temporal modeling strategies, and methods for constructing the adjacency matrix. Results indicate that the proposed approach achieves superior performance across multiple metrics, demonstrating robust stability and strong generalization capabilities in complex network traffic forecasting scenarios.
Via

May 12, 2025
Abstract:Monitoring and analyzing electrocardiogram (ECG) signals, even under varying physiological conditions, including those influenced by physical activity, drugs and stress, is crucial to accurately assess cardiac health. However, current AI-based methods often fail to account for how these factors interact and alter ECG patterns, ultimately limiting their applicability in real-world settings. This study introduces IKrNet, a novel neural network model, which identifies drug-specific patterns in ECGs amidst certain physiological conditions. IKrNet's architecture incorporates spatial and temporal dynamics by using a convolutional backbone with varying receptive field size to capture spatial features. A bi-directional Long Short-Term Memory module is also employed to model temporal dependencies. By treating heart rate variability as a surrogate for physiological fluctuations, we evaluated IKrNet's performance across diverse scenarios, including conditions with physical stress, drug intake alone, and a baseline without drug presence. Our assessment follows a clinical protocol in which 990 healthy volunteers were administered 80mg of Sotalol, a drug which is known to be a precursor to Torsades-de-Pointes, a life-threatening arrhythmia. We show that IKrNet outperforms state-of-the-art models' accuracy and stability in varying physiological conditions, underscoring its clinical viability.
Via

May 12, 2025
Abstract:Event cameras offer significant advantages over traditional frame-based sensors. These include microsecond temporal resolution, robustness under varying lighting conditions and low power consumption. Nevertheless, the effective processing of their sparse, asynchronous event streams remains challenging. Existing approaches to this problem can be categorised into two distinct groups. The first group involves the direct processing of event data with neural models, such as Spiking Neural Networks or Graph Convolutional Neural Networks. However, this approach is often accompanied by a compromise in terms of qualitative performance. The second group involves the conversion of events into dense representations with handcrafted aggregation functions, which can boost accuracy at the cost of temporal fidelity. This paper introduces a novel Self-Supervised Event Representation (SSER) method leveraging Gated Recurrent Unit (GRU) networks to achieve precise per-pixel encoding of event timestamps and polarities without temporal discretisation. The recurrent layers are trained in a self-supervised manner to maximise the fidelity of event-time encoding. The inference is performed with event representations generated asynchronously, thus ensuring compatibility with high-throughput sensors. The experimental validation demonstrates that SSER outperforms aggregation-based baselines, achieving improvements of 2.4% mAP and 0.6% on the Gen1 and 1 Mpx object detection datasets. Furthermore, the paper presents the first hardware implementation of recurrent representation for event data on a System-on-Chip FPGA, achieving sub-microsecond latency and power consumption between 1-2 W, suitable for real-time, power-efficient applications. Code is available at https://github.com/vision-agh/RecRepEvent.
* Presented at the Real-time Processing of Image, Depth and Video
Information 2025 workshop and to be considered for publication is the SPIE
Proceedings
Via

May 09, 2025
Abstract:The physical load of jumps plays a critical role in injury prevention for volleyball players. However, manual video analysis of jump activities is time-intensive and costly, requiring significant effort and expensive hardware setups. The advent of the inertial measurement unit (IMU) and machine learning algorithms offers a convenient and efficient alternative. Despite this, previous research has largely focused on either jump classification or physical load estimation, leaving a gap in integrated solutions. This study aims to present a pipeline to automatically detect jumps and predict heights using data from a waist-worn IMU. The pipeline leverages a Multi-Stage Temporal Convolutional Network (MS-TCN) to detect jump segments in time-series data and classify the specific jump category. Subsequently, jump heights are estimated using three downstream regression machine learning models based on the identified segments. Our method is verified on a dataset comprising 10 players and 337 jumps. Compared to the result of VERT in height estimation (R-squared=-1.53), a commercial device commonly used in jump landing tasks, our method not only accurately identifies jump activities and their specific types (F1-score=0.90) but also demonstrates superior performance in height prediction (R-squared=0.50). This integrated solution offers a promising tool for monitoring physical load and mitigating injury risk in volleyball players.
* submitted to EMBC conference 2025 (accepeted)
Via

May 11, 2025
Abstract:Diabetes remains a significant health challenge globally, contributing to severe complications like kidney disease, vision loss, and heart issues. The application of machine learning (ML) in healthcare enables efficient and accurate disease prediction, offering avenues for early intervention and patient support. Our study introduces an innovative diabetes prediction framework, leveraging both traditional ML techniques such as Logistic Regression, SVM, Na\"ive Bayes, and Random Forest and advanced ensemble methods like AdaBoost, Gradient Boosting, Extra Trees, and XGBoost. Central to our approach is the development of a novel model, DNet, a hybrid architecture combining Convolutional Neural Network (CNN) and Long Short-Term Memory (LSTM) layers for effective feature extraction and sequential learning. The DNet model comprises an initial convolutional block for capturing essential features, followed by a residual block with skip connections to facilitate efficient information flow. Batch Normalization and Dropout are employed for robust regularization, and an LSTM layer captures temporal dependencies within the data. Using a Kaggle-sourced real-world diabetes dataset, our model evaluation spans cross-validation accuracy, precision, recall, F1 score, and ROC-AUC. Among the models, DNet demonstrates the highest efficacy with an accuracy of 99.79% and an AUC-ROC of 99.98%, establishing its potential for superior diabetes prediction. This robust hybrid architecture showcases the value of combining CNN and LSTM layers, emphasizing its applicability in medical diagnostics and disease prediction tasks.
Via

May 07, 2025
Abstract:Irregular multivariate time series (IMTS) are prevalent in real-world applications across many fields, where varying sensor frequencies and asynchronous measurements pose significant modeling challenges. Existing solutions often rely on a pre-alignment strategy to normalize data, which can distort intrinsic patterns and escalate computational and memory demands. Addressing these limitations, we introduce STRGCN, a Spatio-Temporal Relational Graph Convolutional Network that avoids pre-alignment and directly captures the complex interdependencies in IMTS by representing them as a fully connected graph. Each observation is represented as a node, allowing the model to effectively handle misaligned timestamps by mapping all inter-node relationships, thus faithfully preserving the asynchronous nature of the data. Moreover, we enhance this model with a hierarchical ``Sandwich'' structure that strategically aggregates nodes to optimize graph embeddings, reducing computational overhead while maintaining detailed local and global context. Extensive experiments on four public datasets demonstrate that STRGCN achieves state-of-the-art accuracy, competitive memory usage and training speed.
Via

May 08, 2025
Abstract:Grain growth simulation is crucial for predicting metallic material microstructure evolution during annealing and resulting final mechanical properties, but traditional partial differential equation-based methods are computationally expensive, creating bottlenecks in materials design and manufacturing. In this work, we introduce a machine learning framework that combines a Convolutional Long Short-Term Memory networks with an Autoencoder to efficiently predict grain growth evolution. Our approach captures both spatial and temporal aspects of grain evolution while encoding high-dimensional grain structure data into a compact latent space for pattern learning, enhanced by a novel composite loss function combining Mean Squared Error, Structural Similarity Index Measurement, and Boundary Preservation to maintain structural integrity of grain boundary topology of the prediction. Results demonstrated that our machine learning approach accelerates grain growth prediction by up to \SI{89}{\times} faster, reducing computation time from \SI{10}{\minute} to approximately \SI{10}{\second} while maintaining high-fidelity predictions. The best model (S-30-30) achieving a structural similarity score of \SI{86.71}{\percent} and mean grain size error of just \SI{0.07}{\percent}. All models accurately captured grain boundary topology, morphology, and size distributions. This approach enables rapid microstructural prediction for applications where conventional simulations are prohibitively time-consuming, potentially accelerating innovation in materials science and manufacturing.
Via
